Reducing deadline miss rate for grid workloads running in virtual machines : a deadline-aware and adaptive approach

نویسنده

  • Omer Khalid
چکیده

This thesis explores three major areas of research; integration of virutalization into scientific grid infrastructures, evaluation of the virtualization overhead on HPC grid job’s performance, and optimization of job execution times to increase their throughput by reducing job deadline miss rate. Integration of the virtualization into the grid to deploy on-demand virtual machines for jobs in a way that is transparent to the end users and have minimum impact on the existing system poses a significant challenge. This involves the creation of virtual machines, decompression of the operating system image, adapting the virtual environment to satisfy software requirements of the job, constant update of the job state once it’s running with out modifying batch system or existing grid middleware, and finally bringing the host machine back to a consistent state. To facilitate this research, an existing and in production pilot job framework has been modified to deploy virtual machines on demand on the grid using virtualization administrative domain to handle all I/O to increase network throughput. This approach limits the change impact on the existing grid infrastructure while leveraging the execution and performance isolation capabilities of virtualization for job execution. This work led to evaluation of various scheduling strategies used by the Xen hypervisor to measure the sensitivity of job performance to the amount of CPU and memory allocated under various configurations. However, virtualization overhead is also a critical factor in determining job execution times. Grid jobs have a diverse set of requirements for machine resources such as CPU, Memory, Network and have inter-dependencies on other jobs in meeting their deadlines since the input of one job can be the output from the previous job. A novel resource provisioning model was devised to decrease the impact of virtualization overhead on job execution. Finally, dynamic deadline-aware optimization algorithms were introduced using exponential smoothing and rate limiting to predict job failure rates based on static and dynamic virtualization overhead. Statistical techniques were also integrated into the optimization algorithm to flag jobs that are at risk to miss their deadlines, and taking preventive action to increase overall job throughput.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Stability Assessment Metamorphic Approach (SAMA) for Effective Scheduling based on Fault Tolerance in Computational Grid

Grid Computing allows coordinated and controlled resource sharing and problem solving in multi-institutional, dynamic virtual organizations. Moreover, fault tolerance and task scheduling is an important issue for large scale computational grid because of its unreliable nature of grid resources. Commonly exploited techniques to realize fault tolerance is periodic Checkpointing that periodically ...

متن کامل

Evaluating the Reliability of Large-Scale Heterogeneous Grid Computing Systems in Dynamic Workload Environments

With the increasing scale of grid systems, reliability evaluation for both grid systems and applications become more and more challenging, especially when taking the heterogeneity and dynamical workload into consideration. In this work, a workload-aware reliability evaluation model is proposed, in which queuing system is applied to describe the dynamic workload and working of grid resources. To...

متن کامل

An Adaptive Weighted Fuzzy Controller Applied on Quality of Service of Intelligent 5G Environments

in computational intelligence area, it is suitable to fulfill the analysis in order to interpret the concept and sources of uncertainty and the conditions of its incidence, and hence pursuit for reliable techniques of dealing with it. Dealing with uncertainties in this case is a challenging and multidisciplinary activity. So, there is a need for a capable tool for modeling, control, and analyti...

متن کامل

Co-scheduling Deadline-Sensitive Applications in Large-scale Grid Systems

In large-scale grid systems, plenty of applications are constrained by soft or hard deadline requirement. However, it is difficult to guarantee the deadline requirements of these applications because of the dynamical nature of distributed systems. In this paper, a novel approach is proposed to evaluate the deadline-guarantee of co-allocation schemes that obtained from conventional co-allocation...

متن کامل

SQUASH: Simple QoS-Aware High-Performance Memory Scheduler for Heterogeneous Systems with Hardware Accelerators

Modern SoCs integrate multiple CPU cores and Hardware Accelerators (HWAs) that share the same main memory system, causing interference among memory requests from different agents. The result of this interference, if not controlled well, is missed deadlines for HWAs and low CPU performance. State-of-the-art mechanisms designed for CPU-GPU systems strive to meet a target frame rate for GPUs by pr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011